A set of new Chebyshev kernel functions for support vector machine pattern classification

نویسندگان

  • Sedat Ozer
  • Chi Hau Chen
  • Hakan A. Çirpan
چکیده

ll rights reserved. Recently the Chebyshev kernel has been proposed for SVM and it has been proven that it is a valid kernel for scalar valued inputs in [11]. However in pattern recognition, many applications require multidimensional vector inputs. Therefore there is a need to extend the previous work onto vector inputs. In [11], although it is not stated explicitly, the authors recommend evaluating the kernel function on each element pair and then multiplying the outputs (see Chapter V). However, since the kernel functions are defined as the inner product of two given vectors in the higher dimensional space for SVM and since a kernel function provides a measure for the similarity between two vectors, it would be expected that instead of applying kernel functions on each input element (feature), applying them onto vector inputs directly would yield better generalization ability as we will discuss in the following chapters. Therefore in this study we propose generalized Chebyshev kernels by introducing vector Chebyshev polynomials. Using the generalized Chebyshev polynomials, we construct a new family of kernel functions and we show that they are more robust than the ones presented in [11]. On experiments, the proposed generalized Chebyshev kernel function gives its best performance within a small range of integer numbers of the kernel parameter. This property of the kernel function can be used to construct SVMs, where one needs to use semi-parametric kernel functions by choosing the kernel parameter from a small set of integers. In this study, we also show how to construct different kernel functions by using the Generalized Chebyshev Polynomials. In order to capture the highly nonlinear boundaries in the Euclidian space, the weighting S. Ozer et al. / Pattern Recognition 44 (2011) 1435–1447 1436 function can be modified. Therefore in this study, we modify the generalized Chebyshev kernel for better accuracy by changing the weighting function with an exponential function and propose the modified Chebyshev kernel function. In simulations, we also compare the proposed kernel functions to the other commonly used kernel functions. Experimental results show that the proposed set of new kernel functions, on average, show better performance than all other kernel functions used in the test. Moreover, during the tests, we observed that the generalized Chebyshev kernel function approaches the minimum support vector (SV) number in general. This property can be useful for the researchers who need to reduce support vector number in their datasets. 2. Support vector machine The fundamentals of SVM can be traced back to the statistical learning theory [1]. However, in its current form, SVM is a deterministic supervised learning algorithm, rather than being a statistical learning method. SVM has several different variations based on its cost function as in [1,6], but regardless of these variations, the fundamental form of SVM is based on the idea of inserting an hyperplane between the two (binary) classes which can be done by either inserting such hyperplane in the current data space, (linear SVM), or in the higher dimensional space by using the kernel functions (nonlinear SVM). SVM uses the following formula to find the label of a given test data [1,12]: f ðxÞ 1⁄4 sgn Xk i 1⁄4 1 aiyiKðx,xiÞþb ! ð1Þ where ai is nonzero Lagrange multiplier of the associated support vector xi, k the support vector number, K(.) the kernel function, f(x) the class label of the given test data x. The class labels yi corresponding to the SV xi, can only have binary values, i.e., yiA{ 1, +1}, and b is the bias value. a values are found by maximizing the following function: wðaÞ 1⁄4 Xl i 1⁄4 1 ai 1 2 Xl i 1⁄4 1 Xl j 1⁄4 1 aiajyiyjKðxi,xjÞ ð2Þ subject to Pl i 1⁄4 1 aiyi 1⁄4 0 and aiZ0, where l is the number of training samples. Thus xi input vectors, for which the corresponding Lagrange multiplier ai is nonzero, are called support vectors in the training data. 3. SVM kernel functions An ideal SVM kernel function yields an inner product of given two vectors in a high dimensional vector space where all the input data can be linearly separated, [1,12]. Therefore, the inner product of any given pair of transformed vectors in the higher dimensional space can be found by applying the kernel function onto the input vectors directly without the need of an appropriate transformation function fð:Þ as Kðx,zÞ 1⁄4/fðxÞ,fðzÞS ð3Þ where Kð:Þ is the kernel function. Some of the most common kernel functions are listed below: Gaussian kernel [1,16]: Kðx,zÞ 1⁄4 exp :x z: 2s2 ! ð4Þ Polynomial kernel [1]: Kðx,zÞ 1⁄4 /x,zSþ1 b n ð5Þ Wavelet kernel [7]: Kðx,zÞ 1⁄4 Ym j 1⁄4 1 Cos 1:75 xj zj a exp :xj zj: 2 2a2 ! ! ð6Þ where the s, n and a are the kernel parameters for the Gaussian, polynomial and wavelet kernels, respectively. b is the scaling parameter for the polynomial kernel. Kernel functions should be applied onto input vectors directly instead of applying them onto each element and combining the results by a product, since the kernel functions are supposed to provide a measure of the correlation of two input vectors in a higher dimensional space. If we consider the family of the kernel functions where each kernel function is applied onto the pairs of elements individually, for a given pair of two input vectors x and z, the resulting kernel can be formulated as Kjðx,zÞ 1⁄4/fðxjÞ,fðzjÞS ð7Þ where Kj(.) is the kernel function evaluated on the jth elements of the vector pair x and z. Thus the final kernel value can be found as Kðx,zÞ 1⁄4 Ym j 1⁄4 1 Kjðxj,zjÞ ð8Þ Both previously proposed Chebyshev kernel and the wavelet kernel functions are constructed in the form of Eq. (8). This approach, however, may yield poor generalization ability. Consider that both input vectors x and z may be relatively closer to each other and belong to the same class. In that case it would be expected that the kernel function would give a relatively higher value but if their one pair of elements yields a kernel value Kj(xj,zj) close to zero then the whole product will yield a very small value indicating that both vectors have a very small correlation. Thus, SVM will be forced to learn along each element instead of along each vector. (This situation is illustrated on spiral dataset experiments in Figs. 3, 6 and 10.) Therefore, we believe that the kernel functions should provide better results if they are applied onto input vectors directly instead of applying the kernel functions on each element pair first and then combining these results with a multiplication operation. 4. Chebyshev polynomials Chebyshev polynomials are a set of orthogonal polynomials commonly being used in many applications including filtering. The orthogonal set of polynomials is denoted by Tn(x) n1⁄40, 1,2,3,y for the x values between [ 1,1]. The first kind of Chebyshev polynomials Tn(x), of order n, is defined as [13] T0ðxÞ 1⁄4 1 T1ðxÞ 1⁄4 x TnðxÞ 1⁄4 2xTn 1ðxÞ Tn 2ðxÞ ð9Þ The Chebyshev polynomials of the first kind are orthogonal with respect to the weighting function 1= ffiffiffiffiffiffiffiffiffiffiffi 1 x2 p , therefore, for given two Chebyshev polynomials, if integrated between the interval [ 1,1] we have [13]: Z 1 1 TiðxÞTjðxÞ 1 ffiffiffiffiffiffiffiffiffiffiffi 1 x2 p dx1⁄4 0 ia j p=2 i1⁄4 ja0 p i1⁄4 j1⁄4 0 8>< >: ð10Þ Although we do not have an analytical proof yet, during the simulations, while exploiting the properties of the Chebyshev polynomials for large n values, we noticed that the Chebyshev Table 1 List of the generalized Chebyshev kernel functions up to 4th order. Kernel parameter: n Kernel function: K(x,z) S. Ozer et al. / Pattern Recognition 44 (2011) 1435–1447 1437 polynomials hold the following property: Pn i 1⁄4 0 TiðxÞTiðzÞ ffiffiffiffiffiffiffiffiffiffiffi 1 xz p o Pn i 1⁄4 0 TiðxÞTiðxÞ ffiffiffiffiffiffiffiffiffiffiffi 1 x2 p ð11Þ where x and z are scalars. 0 1 ffiffiffiffiffiffiffiffiffiffi m c p 1 1þc ffiffiffiffiffiffiffiffiffiffi m c p 2 1þcþð2a 1Þð2b 1Þ ffiffiffiffiffiffiffiffiffiffi m c p 3 1þcþð2a 1Þð2b 1Þ ffiffiffiffiffiffiffiffiffiffi m c p þcð4a 3Þð4b 3Þ ffiffiffiffiffiffiffiffiffiffi m c p 4 1þcþð2a 1Þð2b 1Þ ffiffiffiffiffiffiffiffiffiffi m c p þcð4a 3Þð4b 3Þ ffiffiffiffiffiffiffiffiffiffi m c p þð8a 8aþ1Þð8b 2 8bþ1Þ ffiffiffiffiffiffiffiffiffiffi m c p 5. Chebyshev kernel: previous work The Chebyshev kernel for the given scalar valued inputs x and z is defined as [11] Kðx,zÞ 1⁄4 Pn i 1⁄4 0 TiðxÞTiðzÞ ffiffiffiffiffiffiffiffiffiffiffi 1 xz p ð12Þ For instance, for scalar values, 3rd order orthogonal Chebyshev kernel is defined as Kðx,zÞ 1⁄4 1þxzþð2x 2 1Þð2z2 1Þ ffiffiffiffiffiffiffiffiffiffiffi 1 xz p þ ð4x 3 3xÞð4z3 3zÞ ffiffiffiffiffiffiffiffiffiffiffi 1 xz p ð13Þ As the Chebyshev polynomials are orthogonal only within the region [ 1,1], the input data needs to be normalized within this region according to the following formula: x 1⁄4 2ðx MinÞ Max Min 1 ð14Þ where Min and Max are the minimum and maximum values of the entire data, respectively. Although it is not clear how the kernel function has been applied onto the vector inputs on the experiments in [11], the computer code that has been sent by the authors of [11] allows us to derive the following equation for the Chebyshev kernel: Kðx,zÞ 1⁄4 Ym j 1⁄4 1 Pn i 1⁄4 0 TiðxjÞTiðzjÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 xjzj p ð15Þ where m is the dimension of the training vectors x and z. 6. Generalized Chebyshev kernels Here, we propose a generalized way of expressing the kernel function to clarify the ambiguity on how to implement Chebyshev kernels. To the best of our knowledge, there was no previous work defining the Chebyshev polynomials for vector inputs recursively. Therefore for vector inputs, we define the generalized Chebyshev polynomials as T0ðxÞ 1⁄4 1 T1ðxÞ 1⁄4 x TnðxÞ 1⁄4 2xT n 1ðxÞ Tn 2ðxÞ for n1⁄4 2,3,4,. . . ð16Þ where TT n 1 is the transpose of the Tn 1(x) and x is a row vector. Therefore, if the polynomial order n is an odd number, the generalized Chebyshev polynomial, Tn(x), yields a row vector, otherwise, it yields a scalar value. Thus by using generalized Chebyshev polynomials, we define generalized nth order Chebyshev kernel as Kðx,zÞ 1⁄4 Pn j 1⁄4 0 TjðxÞT j ðzÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a ox,z4 p where a1⁄4m ð17Þ where x and z are m-dimensional vectors. In Eq. (17), the denominator must be greater than zero: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a /x,zS p 40 ð18Þ To satisfy (18), as each element in x and z vectors has a value between [ 1,1], the maximum value for the inner product /x,zS is equal to Pm i 1⁄4 1 11⁄4m, thus minimum a value will be equal to m which is the dimension of input vector x. As a result, the 6th order Generalized Chebyshev Kernel can be written as Kðx,zÞ 1⁄4 1þcþð2a 1Þð2b 1Þ ffiffiffiffiffiffiffiffiffiffi m c p þ cð4a 3Þð4b 3Þ ffiffiffiffiffiffiffiffiffiffi m c p þ ð8a 2 8aþ1Þð8b2 8bþ1Þ ffiffiffiffiffiffiffiffiffiffi m c p þ cð16a 2 20aþ5Þð16b2 20bþ5Þ ffiffiffiffiffiffiffiffiffiffi m c p þ ð32a 3 48a2þ18a 1Þð32b3 48b2þ18b 1Þ ffiffiffiffiffiffiffiffiffiffi m c p ð19Þ where a1⁄4/x,xS, b1⁄4/z,zS and c1⁄4/x,zS. Also the first 4th order kernel functions are listed in Table 1. Fig. 1 shows the generalized Chebyshev kernel output K(z,x) for various kernel parameters, where z changes within the range of [ 0.999, 0.999], and where x is fixed at a constant value. Fig. 1(a) and (b) shows the kernel function K(z, 0.77), while Fig. 1(c) and (d) shows the K(z,0) value and Fig. 1(e) and (f) shows the K(z, 0.77) value for various kernel parameters. Fig. 1(a), (c) and (e) shows the results by using Eq. (20): Kðx,zÞ 1⁄4 TnðxÞT T n ðzÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m ox,z4 p ð20Þ Notice that Eq. (20) differs from Eq. (17) since it does not have the summation term in it. Fig. 1(b), (d) and (f) show the result by using Eq. (17). One can observe that, without using the sum, the kernel function cannot be used in SVM for the similarity purpose. The generalized Chebyshev kernel function shape is not symmetric around the x-value, and can be asymmetric as shown in Fig. 1(b) and (f). Unlike the Gaussian kernel whose shape can be altered by the kernel parameter only, the Chebyshev kernels also alter their shape based on the input values. The following two subsections briefly discuss the validity and the robustness of the generalized Chebyshev kernel. 6.1. Validity To be a valid SVM kernel, a kernel should satisfy the Mercer Conditions [1,12]. If the kernel does not satisfy the Mercer Conditions, SVM may not find the optimal parameters, but rather it may find suboptimal parameters. Also if the Mercer conditions are not satisfied, then the Hessian matrix for the optimization part may not be positive definite. Therefore we examine if the Generalized Chebyshev kernel satisfies the Mercer conditions: Mercer Theorem: To be a valid SVM kernel, for any finite function g(x), the following integration should always be nonnegative for the given kernel function K(x,z) [1]: ZZ Kðx,zÞgðxÞgðzÞdxdzZ0 ð21Þ Fig. 1. x-value vs. the kernel output for 3 separate fixed values, i.e. for K(x,0.77), for K(x,0) and for K(x, 0.77) for various kernel parameters. (a) The results by using Eq. (20). (b) The results by using Eq. (17). (c) The results by using Eq. (20). (d) The results by using Eq. (17). (e) The results by using Eq. (20). (f) The results by using Eq. (17). S. Ozer et al. / Pattern Recognition 44 (2011) 1435–1447 1438 S. Ozer et al. / Pattern Recognition 44 (2011) 1435–1447 1439 Proposition. The multiplication of two valid kernels is also a valid kernel [12]. Therefore, we can express the nth order Chebyshev kernel as a product of two kernel functions: Kðx,zÞ 1⁄4 Kð1Þðx,zÞKð2Þðx,zÞ ð22Þ where Kð1Þðx,zÞ 1⁄4 Xn j 1⁄4 0 TjðxÞT j ðzÞ 1⁄4 T0ðxÞT T 0 ðzÞþT1ðxÞT T 1 ðzÞþ þTnðxÞT T n ðzÞ ð23Þ and Kð2Þðx,zÞ 1⁄4 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m /x,zS p ð24Þ Consider that g(x) is a function where g: R-R, then we can evaluate and verify the Mercer condition for K(1)(x,z) as follows by assuming each element is independent from others: ZZ Kð1Þðx,zÞgðxÞgðzÞdxdz1⁄4 ZZ Xn j 1⁄4 0 TjðxÞT j ðzÞgðxÞgðzÞdxdz 1⁄4 Xn j 1⁄4 0 ZZ TjðxÞT j ðzÞgðxÞgðzÞdxdz1⁄4 Xn j 1⁄4 0 Z TjðxÞgðxÞdx Z T j ðzÞgðzÞdz 1⁄4 Xn j 1⁄4 0 Z TjðxÞgðxÞdx Z T j ðxÞgðxÞdx Z0 ð25Þ Therefore, the kernel K(1)(x,z) is a valid kernel. Theorem. Power Series of Dot Product Kernels: If a kernel is a function of dot product: Kðx,zÞ 1⁄4 Kðox,z4 Þ ð26Þ then its power series expansion [12], KðtÞ 1⁄4 P1 j 1⁄4 0 ajt j is a positive definite kernel if ajZ0 where t1⁄4/x,zS, for all j. As K(2)(x,z) is a function of inner product t, we can find its Maclaurin expansion, and the expansion coefficients. If all the coefficients are non-negative for the expansion, then this kernel will be a valid kernel KðtÞ 1⁄4 Kð0Þþ X1 j 1⁄4 1 Kjð0Þ j! t ð27Þ where jth derivative is defined as Kð0Þ 1⁄4 d jKðtÞ dtj t 1⁄4 0 1⁄4 Qj k 1⁄4 1 2k 1 2mð2jþ1Þ=2 ð28Þ So the Maclaurin expansion takes the form of Kð2Þðx,zÞ 1⁄4 1ffiffiffiffi m p þ X1 j 1⁄4 1 m ð2jþ1Þ=2ox,z4 j 2j! Yj k 1⁄4 1 ð2k 1Þ ! ð29Þ where each coefficient is positive since mZ1, hence K(2)(x,z) is a valid kernel function too. As a result the kernel Kðx,zÞ 1⁄4 Kð1Þðx,zÞKð2Þðx,zÞ is also a valid kernel. 6.2. Robustness The previously proposed Chebyshev kernel suffered from illposed problems as mentioned in [11]. The main reason for the ill-posed problems was the square-root at the denominator of the kernel function. When the value of ffiffiffiffiffiffiffiffiffiffiffi 1 xz p is very close to zero, the kernel value may yield an infinitely big number that can affect the Hessian matrix badly, forcing it to become singular. Although a small e-value can be added to this expression to avoid division by zero, it still affects the Hessian matrix and it may be needed to re-set for each dataset. Alternatively, using another function such as ffiffiffiffiffiffiffiffiffiffiffi q xz p helps to overcome this problem; however, such function reduces the performance of the previously proposed Chebyshev kernel function for SVM; as the q value increases the effect of the denominator will vanish by converging to a constant value. The generalized Chebyshev kernel particularly solves this problem as it uses the function ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m /x,zS p as the denominator. Under the assumption of each element and each vector are I.I.D., then the probability of this function being zero is less than the probability of the function ffiffiffiffiffiffiffiffiffiffiffi 1 xz p being zero for m41. Therefore in real world applications where often mb1, the ill-posed problem which is caused by the denominator being zero will be eliminated with a higher probability, if one employees the generalized Chebyshev kernel function. In experiments, we also show that the generalized Chebyshev kernel is also more robust with respect to the kernel parameter when compared to the Chebyshev kernel. 7. Modified Chebyshev kernels By introducing the use of the generalized Chebyshev polynomials, we make it easier to derive new kernel functions from the generalized Chebyshev kernels. Based on the generalized Chebyshev polynomials, one can construct new kernel functions or can modify the generalized Chebyshev kernel as needed. Generalized Chebyshev polynomials of the second kind (U(x)) as defined in [14], can also be used to create another set of kernel function by using another weighting function as follows: Kðx,zÞ 1⁄4 Xn j 1⁄4 0 UjðxÞU j ðzÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a ox,z4 p ð30Þ However, the preliminary test results showed very similar results to the kernel function that uses the first kind of generalized Chebyshev kernel polynomials; therefore, in this study we will not include Eq. (30) in the tests. Here we provide an example on how to modify the generalized Chebyshev kernels by replacing the weighting function in the generalized Chebyshev kernel with an exponential function. As exponential functions can decay faster than the square root function can, in Eq. (25) we replace the weighting function K(2)(x,z), with an exponential function (which is a Gaussian kernel function) to obtain a more nonlinear kernel that captures the nonlinearity better along the decision surface where the surface can change its shape more rapidly. For this purpose, if we use the function Kð2Þðx,zÞ 1⁄4 expð g :x z:Þ which is a Gaussian kernel with s2 1⁄4 ð2gÞ , we can obtain another valid SVM kernel function. The resulting new kernel becomes: Kðx,zÞ 1⁄4 Pn j 1⁄4 0 TjðxÞT j ðzÞ exp g:x z: ð31Þ where n is the Chebyshev polynomial order and g is the decaying parameter. 8. Data normalization Data normalization plays an important role for the generalized Chebyshev kernel as K(2)(x,z) may become complex valued if the condition ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m ox,z4 p Z0 cannot always be satisfied. By normalizing x and z values, this condition will always be satisfied. Also the Chebyshev polynomials are defined for the values between

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A COMPARATIVE ANALYSIS OF WAVELET-BASED FEMG SIGNAL DENOISING WITH THRESHOLD FUNCTIONS AND FACIAL EXPRESSION CLASSIFICATION USING SVM AND LSSVM

This work presents a technique for the analysis of Facial Electromyogram signal activities to classify five different facial expressions for Computer-Muscle Interfacing applications. Facial Electromyogram (FEMG) is a technique for recording the asynchronous activation of neuronal inside the face muscles with non-invasive electrodes. FEMG pattern recognition is a difficult task for the researche...

متن کامل

Remote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery

Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...

متن کامل

MODELING OF FLOW NUMBER OF ASPHALT MIXTURES USING A MULTI–KERNEL BASED SUPPORT VECTOR MACHINE APPROACH

Flow number of asphalt–aggregate mixtures as an explanatory factor has been proposed in order to assess the rutting potential of asphalt mixtures. This study proposes a multiple–kernel based support vector machine (MK–SVM) approach for modeling of flow number of asphalt mixtures. The MK–SVM approach consists of weighted least squares–support vector machine (WLS–SVM) integrating two kernel funct...

متن کامل

MULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM

Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...

متن کامل

Common Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain

Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...

متن کامل

Acoustic detection of apple mealiness based on support vector machine

Mealiness degrades the quality of apples and plays an important role in fruit market. Therefore, the use of reliable and rapid sensing techniques for nondestructive measurement and sorting of fruits is necessary. In this study, the potential of acoustic signals of rolling apples on an inclined plate as a new technique for nondestructive detection of Red Delicious apple mealiness was investigate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2011